Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English Wikipedia

نویسندگان

چکیده

Abstract Wikipedia’s content is based on reliable and published sources. To this date, relatively little known about what sources Wikipedia relies on, in part because extracting citations identifying cited challenging. close gap, we release Citations, a comprehensive data set of extracted from Wikipedia. We extracted29.3 million 6.1 English articles as May 2020, classified being books, journal articles, or Web content. were thus able to extract 4.0 scholarly publications with identifiers—including DOI, PMC, PMID, ISBN—and further equip an extra 261 thousand DOIs Crossref. As result, find that 6.7% cite at least one article associated cites just 2% all DOI currently indexed the Science. our code allow community extend upon work update future.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scientific citations in Wikipedia

The Internet-based encyclopædia Wikipedia has grown to become one of the most visited web-sites on the Internet. However, critics have questioned the quality of entries, and an empirical study has shown Wikipedia to contain errors in a 2005 sample of science entries. Biased coverage and lack of sources are among the “Wikipedia risks”. The present work describes a simple assessment of these aspe...

متن کامل

Clustering of scientific citations in Wikipedia

The instances of templates in Wikipedia form an interesting data set of structured information. Here I focus on the cite journal template that is primarily used for citation to articles in scientific journals. These citations can be extracted and analyzed: Non-negative matrix factorization is performed on a (article × journal) matrix resulting in a soft clustering of Wikipedia articles and scie...

متن کامل

Wikipedia as a gateway to biomedical research: The relative distribution and use of citations in the English Wikipedia

Wikipedia is a gateway to knowledge. However, the extent to which this gateway ends at Wikipedia or continues via supporting citations is unknown. Wikipedia's gateway functionality has implications for information design and education, notably in medicine. This study aims to establish benchmarks for the relative distribution and referral (click) rate of citations-as indicated by presence of a D...

متن کامل

Large SMT data-sets extracted from Wikipedia

The article presents experiments on mining Wikipedia for extracting SMT useful sentence pairs in three language pairs. Each extracted sentence pair is associated with a cross-lingual lexical similarity score based on which, several evaluations have been conducted to estimate the similarity thresholds which allow the extraction of the most useful data for training three-language pairs SMT system...

متن کامل

E-citations: actionable identifiers and scholarly referencing

This document discusses the role of "actionable" identifiers such as the Digital Object Identifier (DOI) in enabling scholarly citations in a digital environment. Citation is a sub-set of the general wider concept of linkage, but an interesting one for two reasons: it is a practical example being worked on today, and it demonstrates that linkage only between digital entities is insufficient for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Quantitative science studies

سال: 2021

ISSN: ['2641-3337']

DOI: https://doi.org/10.1162/qss_a_00105